Sketching Distributed Data Provenance
نویسندگان
چکیده
Users can determine the precise origins of their data by collecting detailed provenance records. However, auditing at a finer grain produces large amounts of metadata. To efficiently manage the collected provenance, several provenance management systems, including SPADE, record provenance on the hosts where it is generated. Distributed provenance raises the issue of efficient reconstruction during the query phase. Recursively querying provenance metadata or computing its transitive closure is known to have limited scalability and cannot be used for large provenance graphs. We present matrix filters, which are novel data structures for representing graph information, and demonstrate their utility for improving query efficiency with experiments on provenance metadata gathered while executing distributed workflow applications.
منابع مشابه
A Provenance-Integration Framework for Distributed Workflows in Grid Environments
Provenance information about complex and distributed workflows is a key issue for data quality control and data reliability maintenance in reservoir management. Distributed and integrated environments where different workflows consume and transform data require a comprehensive provenance view. In this scenario provenance collection and integration presents significant challenges. In this paper,...
متن کاملA Distributed Provenance Aware Storage System
The provenance of a file represents the origin and history of the file data. A Distributed Provenance Aware Storage System (DPASS) tracks the provenance of files in a distributed file system. The provenance information can be used to identify potential dependencies between files in a filesystem. Some applications of provenance tracking include (i) tracking the transformations applied to process...
متن کاملData Provenance in Distributed Propagator Networks
Existing distributed programs often require provenance to be included in the design of the distributed computing framework. Distributed programs making use of data propagation do not have this restriction; propagator networks allow non-provenance-aware applications to be easily transformed into provenance-aware forms by simply modifying existing program structure.
متن کاملA Formal Model of Provenance in Distributed Systems
We present a formalism for provenance in distributed systems based on the π-calculus. Its main feature is that all data products are annotated with metadata representing their provenance. The calculus is given a provenance tracking semantics, which ensures that data provenance is updated as the computation proceeds. The calculus also enjoys a pattern-restricted input primitive which allows proc...
متن کاملFusionProv: Towards a Provenance-Aware Distributed Filesystem
It has become increasingly important to capture and understand the origins and derivation of data (its provenance). A key issue in evaluating the feasibility of data provenance is its performance, overheads, and scalability. In this paper, we explore the feasibility of a management layer for parallel file systems, in which metadata includes both file operations and provenance metadata. We desig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012